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ON THE EVIDENCE FOR COSMIC VARIATION OF THE 

FINE STRUCTURE CONSTANT: 
A B AYES IAN REANALYSIS OF THE QUASAR DATASET 

By Ewan Cameron* and Tony Pettitt 
Queensland University of Technology 

We review the evidence behind recent claims of spatial variation 
in the fine structure constant deriving from observations on ground- 
based telescopes of ionic absorption lines in the light from distant 
quasars. To this end we expand upon previous non-Bayesian anal- 
yses limited by the assumptions of a strictly Normal and unbiased 
form for the "unexplained errors" of the benchmark quasar dataset. 
Through nested importance sampling and the method of power pos- 
teriors we evaluate and compare marginal likelihoods (or Bayes fac- 
tors) for three competing hypotheses — (i) the strict null (no cosmic 
^—1 variation), (11) the monopole null (a constant Earth-to-quasar offset 

Oh only), and (ill) the monopole+dipole hypothesis (featuring a cosmic 

variation manifest to the Earth-bound observer as a North-South 
divergence) — under various alternative error terms. Our analysis re- 
veals significant support for a skeptical interpretation in which the 
apparent dipole effect is driven solely by systematic errors of oppos- 
ing sign inherent in measurements from the two telescopes employed 
to obtain the observations. In this context we highlight the impor- 
tance of new observations along the equator of the alleged dipole (in 
addition to the new polar observations planned) in order to more 
strongly test the skeptical interpretation. 

1. Introduction. Introduced in 1916 by Arnold Sommerfeld to abbre- 
viate a recurring, dimensionless factor in his relativistic extension of the 
Bohr model for the hydrogen atom the fine structure constant, a, is now 
recognized as one of the fundamental coupling terms of quantum electrody- 
namics (QED). In this context a serves to characterize the strength of the 
electromagnetic interaction and has been humorously dubbed, "the Peter 
Sellers of QED" (Borie, 1986), owing to the many different roles it plays 
within this theory: setting the scale for all electromagnetic cross-sections, 
binding energies, and decay rates; and, of course, the scale of fine structure 
splitting in atomic and ionic spectral lines (cf. Griffiths 2005). 

Defined by the ratio of the squared elementary charge, e 2 , to the per- 
mittivity of free space, £q, the reduced Planck constant, H, and the vacuum 
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speed of light, c, 



(1) 



a = 




the fine structure constant has an on-Earth laboratory value, known to re- 
markable precision through exacting experimentation on quantum scale sys- 
tems, of a" 1 « 137.035999037(91) (Bouchendira et al., 2011; Hanneke, Fog- 
well and Gabrielse, 2008). Though nominally designated a "constant" there 
are in fact substantial theoretical foundations to support a time-/space- 
varying a, if empirical evidence of such can be definitively established; most 
notably, within electrodynamic scalar field models (Bekenstein, 1982; Car- 
roll, 1998) and grand unification theories (Marciano, 1984; Brax et al., 2003). 
An historical evolution (with respect to cosmological time) of the fine struc- 
ture constant driven by a decreasing speed of light could even offer a com- 
pelling solution to the infamous "Horizon Problem" 1 of Big Bang cosmol- 
ogy without inflation (Moffat, 1993; Barrow, 1999; Albrecht and Magueijo, 



Intriguingly, there exist a number of different approaches, both terrestrial 
and astronomical, through which one may search for this experimentalist's 
"Holy Grail". These include: (i) the in-laboratory comparison of optical 
atomic clocks (Fortier et al., 2007; Rosenband et al., 2008); (n) the nuclear 
modeling of samarium isotopes extracted from the Oklo natural fission re- 
actor (Shlyakhter, 1976; Damour and Dyson, 1996; Gould et al., 2006) or 
rhenium extracted from Earth- fallen meteorite samples (Olive et al., 2004); 
(in) the cosmological modeling of angular fluctuations in the temperature 
and polarization of the Cosmic Microwave Background (Rocha et al., 2004; 
Nakashima, Nagata and Yokoyama, 2008; Calabrese et al., 2011); and/or (iv) 
the search for frequency shifts in a-sensitive features of astronomical spec- 
tra, most notably the fine structure emission lines of ionized carbon observed 
in radio waves from distant lensed galaxies (Levshakov et al., 2012; Weiss 
et al., 2012) and the ionic absorption lines of various species observed in the 
visible/near- visible light from distant quasars (Webb et al., 1999, 2001; Mur- 
phy et al, 2001; Murphy, Webb and Flambaum, 2003; Webb et al., 2011; 
King et al., 2012). After a number of initial disagreements between rival 
teams — see Lamoreaux and Torgerson (2004), for instance — now resolved, 
only the post-Millennial quasar-based studies at present claim to have recov- 

1 Namely, the remarkable homogeneity of the Universe beyond even the scale of casual 
connection under the standard model; that is, over physical separations exceeding the 
maximum distance traversable since the beginning of time at the present-day speed of 
light. 



1999). 
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ered any significant evidence for an evolving fine structure constant. How- 
ever, the Webb et al. team's interpretation of their quasar dataset in this 
regard remains a matter of ongoing controversy; the statistical analysis of 
which is the subject of this paper. 

First identified in 1966 (Burbidge, Lynds and Burbidge, 1966; Stockton 
and Lynds, 1966) the distinctive absorption lines apparent to Earth-bound 
observers at UV-to-optical wavelengths in the light from distant quasars (or 
"quasi-stellar radio sources") arise from the interaction of this light with 
ions of various species encountered during its passage through interven- 
ing gas clouds; these residing in the extensive halos of both bright, star- 
forming galaxies and dark proto- galaxies. The potential of these absorption 
lines (especially the singly and triply ionized silicon [Sill and Siiv] dou- 
blets) as probes of the fundamental constants at extra-galactic distances, 
complementary to the already-used emission line technique (Savedoff, 1956; 
Bahcall and Salpeter, 1965), was quickly realized by Bahcall, Sargent and 
Schmidt (1967), and used to constrain the cosmic variation of a to within 
~10% of its on-Earth value (Aa/a = 0.98[0.05]) in the direction of one 
particular quasar (3C 191). (Throughout both the astronomical literature 
and our analysis herein the particular notation, Aa/a, is used to repre- 
sent the fractional offset of the fine structure constant in the extra-galactic 
system under study from its on-Earth laboratory value, i.e., Aa/a = [a — 
a Earth]/ a Earth-) Improvements in astronomical instrumentation have since 
allowed far stronger constraints on a variation to be established through 
this approach. Ivanchik, Potekhin and Varshalovich (1999), for instance, 
have demonstrated |Aa/a| < 2.3 x 10 -4 (95% CI) in the early Universe (at 
~4 Gyr after the Big Bang; ~9.4 Gyr ago) through Siiv doublet observa- 
tions along nine quasar sightlines from a ground-based Russian telescope 
(the BTA-6 at the Special Astrophysical Observatory). 

To search for cosmic a variation below this limit (of roughly one part 
in ten thousand) with quasar absorption line spectra requires application of 
the "many multiplet" (MM) technique (Dzuba, Flambaum and Webb, 1999) 
in which the relative frequency shifts of multiple ionic species are simulta- 
neously compared; the transitions of those species relatively insensitive to a 
variation (e.g. singly ionized magnesium [Mgn]) effectively serving as cali- 
bration benchmarks for the stronger shifters (e.g. singly ionized iron [Fell]). 
Implementation of the MM technique necessarily involves a multi-parameter 
fit to constrain not only Aa/a but also a suite of nuisance parameters ac- 
counting for the physical structure of the intervening gas cloud (i.e., column 
density, kinetic temperature, and the dominant line broadening mechanism). 
Pioneering this technique in 1999 the Webb et al. team (Webb et al., 1999) 
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recovered tentative evidence that a may have been lower in the past from a 
sample of 30 extra-galactic absorbers spanning 0.5 < z a b s < 1-6 (here z a b s 
denotes the cosmological redshift 2 of the absorber (s) under study) observed 
with the Keck telescope (Aa/a = —1.1 [0.4] x 10~ 5 ). This exciting result 
was quickly heralded as support for a certain class of cosmological model 
admitting a decelerating speed of light (Barrow and Magueijo, 2000; Davies, 
Davis and Lineweaver, 2002), though the claimed theoretical basis for the 
well-publicized Davies et al. interpretation — derived from a (mistaken) con- 
sideration of black hole thermodynamics — was soon disproven (Carlip and 
Vaidya, 2003). More interestingly from a statistical point of view was the 
Webb et al. team's concern for a possible underestimation of the true er- 
rors in their quoted uncertainties, with a remarkably strong dip in their a 
estimates over a narrow redshift interval (0.9 ^ -^abs 

< 1.1) suggesting the 
presence of an unexplained source of observational error. 

Further post-Millennial studies by the same team (Webb et al., 2001; 
Murphy et al., 2001; Murphy, Webb and Flambaum, 2003) with the Keck 
telescope — expanding their original sample to a total of 141 absorbers and 
their redshift baseline to 0.2 < z a b s < 3.7 — ultimately strengthened the ap- 
parent weight of evidence for a time- varying fine structure constant beyond 
the "4<T level"; given the particular assumptions of the statistical analysis 
employed. Chief among these the absence of any bias in the afore-mentioned 
unexplained error term — its presence still inferred from a marked excess of 
the sample variance over that expected from known sources of observational 
noise, and seemingly greater among the high redshift population. Murphy, 
Webb and Flambaum (2003) thus directed their attention at this time to- 
ward a number of possible explanatory factors for an additional error term 
unique to high redshift systems — including the entry of damped Lyman- 
alpha absorbers (dense clouds of neutral hydrogen featuring complex veloc- 
ity structures more challenging to model via the MM technique) into the 
sample at z a b s > 2 (where the rest-frame ultra-violet of their characteristic 
spectral lines is redshifted within the optical window accessible to on-Earth 



2 The cosmological redshift — literally a frequency shift "redwards" (toward longer 
wavelengths) apparent to Earth-based observers in the light received from extra-galactic 
systems due to cosmological expansion of the intervening space — serves as a proxy for 
lookback distance (or lookback time) in astronomy: the greater an object's observed red- 
shift the greater the physical separation between us and it at the time the light received 
on Earth was emitted, and the longer this light has had to travel to reach us. Integral 
formulae relating these physical quantities (subject to a specified parameterization of the 
standard cosmological model) are provided by Hogg (1999). For reference, we note that 
2 = corresponds (of course) to the present day, z = 1 to a lookback distance, r(z), of 
r;7.8 GLyr, and z = 3 to r(z) w 11.4 GLyr. 
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observers). Yet another, the imprint of a spatial variation in a (as we de- 
scribe below), was discounted at this stage as only weakly supported by 
the available data (according to a bootstrap significance test); see Murphy, 
Webb and Flambaum (2003). 

It was thus a great surprise in 2010 when new Act/ a estimates for 154 
(primarily southern hemisphere) absorption systems along 60 quasar sight- 
lines (52 new and 8 in common with the original Keck sample) observed by 
the same team (King et al., 2012; Webb et al., 2011) — but this time using 
the Very Large Telescope (VLT) in Chile — appeared to indicate the opposite 
evolutionary trend with redshift. Namely, that the fine structure constant was 
in fact higher in the past for these absorbers (Aa/a = 0.154 [0.132] x 10~ 5 ). 
To resolve this contradiction the team were forced to resurrect the spatial 
variation hypothesis, proposing a smooth transition in a across the Uni- 
verse manifest to the on-Earth observer as a (z-invariant) monopole plus a 
(z-dependent 3 ) dipole field. We illustrate graphically in Figure 1 (over page) 
the nature of the spatial variation in Aa/a under the best-fit model of this 
form (cf. Section 2.3) from King et al. (2012) on a color-/symbol-coded 
map of the celestial sphere (represented here via a Molleweide projection in 
the equatorial coordinate system of right ascension and declination 4 ). Note 
the scale of the inferred variation, which is at the Aa/a < 10 -5 level only 



3 King et al. (2012) in fact consider a variety of candidate functional forms for their 
dipole model, including a (redshift) z-invariant dipole, a z -dependent dipole, and an 
r(z)-dependent dipole (with r(z) the cosmological lookback distance). All exhibit (boot- 
strap randomization-based) statistical significances of ~4cr over a monopole-only null. 
However, we note that: (i) a z-invariant dipole strictly implies a breaking of the Coperni- 
can principle — namely, that the Earthican observer does not occupy a privileged position 
within the cosmos — in addition to the cosmological principle — namely, that the physical 
laws of the Universe are uniform throughout; and consequently that the Universe appears 
essentially homogenous when viewed on sufficiently large scales — broken by all; and (n) of 
the other two, the z* 3 - alternative (which encompasses a close approximation to the r(z)- 
model at (3 « 0.3) already appears from the King et al. (2012) study to be an over-fitting 
of the available data. Hence, we focus exclusively on the (monopole+)r(z)-dipole scenario 
in the present analysis. For reference, this was also the approach taken by Berengut, Kava 
and Flambaum (2012). 

4 The equatorial coordinate system charts the position of astronomical bodies on the 
celestial sphere with respect to the Earth's equatorial plane. Right ascension (typically 
quoted in units of "time" from to 24 hours) measures the angular distance along the 
equator from the vernal equinox to the (uniquely-specified) great circle passing through 
the designated object and the celestial poles, while declination (typically quoted in degrees 
from -90° to 90°) measures the corresponding angular distance along this great circle from 
the equator. Due to the slow, but astronomically appreciable, precession of the Earth's 
orbital axis it is necessary to define an epoch of reference for this coordinate system; 
the present convention is to quote all equatorial coordinates as if observed at noon on 1 
January 2000 (denoted the J2000 epoch). 
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accessible (as noted earlier) via the MM technique. 



The Monopole + r(z)-Dipole Model 

z = 1 , r(z) = 7.8 GLyr 



The Monopole + r(z)-Dipole Model 

z = 3, r(z) = 11.4 GLyr 



Fig 1. Visualization of the monopole-hr(z) -dipole model (cf. Section 2.3) for explaining the 
apparent spatial variation of the fine structure constant proposed by the Webb et al. team. 
The fractional difference of the fine structure constant from its on-Earth value under this 
model, Att/a mo d, as a function of the observational sightlme at z = 1 under the best-fit 
solution of King et al. (2012) is shown in the lefthand panel and that at z — 3 in the 
righthand panel (with the color- /symbol- coding explained in the top-right legend of each). 
The maximum, minimum, and equator of the key North-South dipole component of this 
model are overlaid as well for reference. 



In Figure 2 (over page) we present a comparable visualization of the "raw" 
Aa/a variation in the Webb et al. team's quasar dataset from which the 
apparent dipole effect shown above was inferred. To this end we compute in 
each bin of right ascension and declination containing at least one quasar 
sightline (but typically two or more; noting as well that there are on average 
2-3 absorbers per sightline) the weighted mean, 

-— -_E(A«/a)/(^ bs + 

^ran) 

Here (j ran represents the standard deviation of the unexplained error term 
in each of the three £ran 

groups under the Webb et al. team's proposed 
generative model (which we detail in Section 2.2). Despite the substantial 
degree of noise evident in these measurements (note the increase in scale 
with respect to that of Figure 1) one may yet discern an excess of negative 
Aa/a estimates in the far North and an excess of positive Aa/a estimates 
in the far South as per the dipole hypothesis. (Though, in anticipation of 
the skeptical interpretation of these results [i.e., that biases of opposite sign 
in the measurements from each telescope are to blame for the apparent 
North-South divergence], we note also that the equator of the alleged dipole 
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provides a near perfect subdivision of the sample into its VLT and Keck 
constituents.) 



The Quasar Dataset Aa/oc 
(King et al. 201 2; Webb et al. 201 1 ) +3 □ 




Fig 2. Visualization of the apparent dipole signature in the King et al. (2012) / Webb 
et al. (2011) quasar dataset. Each quasar sightline in the sample is marked here on an 
equal-area Molleweide projection of the sky based on the J2000 equatorial coordinate sys- 
tem; with solid-symbols denoting VLT observations and open symbols Keck observations. 
Note that for most of these quasars there are multiple intervening absorbers providing inde- 
pendent measurements of Aa/a along the sightline. The apparent spatial variation of this 
observable is illustrated via a color- /symbol- coding (detailed in the top-right legend) of its 
weighted mean in each of the indicated subdivisions. The maximum, minimum, and equa- 
tor corresponding to the best-fit monopole+r(z) -dipole solution of the King et al. (2012) 
paper are overlaid as well for reference. 

The intriguing hypothesis of a spatial variation in the quasar dataset with 
monopole+r(z)-dipole form has been identified by King et al. (2012) through 
a bootstrap randomization (of Aa/a estimates across sightlines and surveys) 
as favored over the simple monopole (i.e., time-/space- invariant with con- 
stant [non-zero] Earth-to-quasar offset) alternative at the "4<r significance 
level" — a result echoed by Berengut, Kava and Flambaum (2012) in their 
review of the dataset using the Akaike Information Criterion, the F statistic, 
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and the "error ellipsoid method" . With the new VLT sample apparently just 
as subject to unexplained measurement noise as the original Keck dataset, 
however, a number of strong assumptions were required to justify the signif- 
icance testing procedures employed in this work. In particular, the authors 
were obliged to treat the aforesaid as strictly Normal and strictly unbiased 
(viz. zero mean and mode). Assessing the impact of these assumptions, which 
are easily relaxed within a Bayesian framework as we demonstrate herein, 
thus represents an essential "next step" in the analysis of this observational 
benchmark given the profound implications for both fundamental physics 
and cosmology if the dipole interpretation is to become accepted. 

Following the extensive world-wide media coverage of their work 5 the 
Webb et al. team's spatial variation hypothesis was strongly criticized in a 
number of public forums, including the popular science blogs, "Uncertain 
Principles" by Chad Orzel and "Cosmic Variance" by Sean Carroll. The 
former citing the evident alignment of the alleged dipole's equator with the 
coverage overlap of the Keck and VLT telescopes (cf. Figure 2) as an indica- 
tor that biases of opposite sign in the observations from each might well be 
to blame; and the latter citing the extraordinarily low mass-energy budget 
(~10~ 42 GeV) for the underlying scalar field implied by the cosmic expanse 
of the fitted dipole. From a Bayesian perspective these objections may be 
viewed as disagreements over the relative prior probabilities of competing 
proposals. Namely, the degree to which the particular model for the un- 
explained error term adopted by the experimenters should be favored over 
some alternative model (or family of models), and the degree to which the 
null hypothesis (of a time-/space- invariant a) should be favored "theoreti- 
cally" over the proposed dipole hypothesis. 

In this paper we aim to address the above concerns explicitly within the 
Bayesian framework by computing, and thereby comparing, the marginal 
likelihoods of the dipole hypothesis, the monopole-only null hypothesis, and 
the strict null hypothesis (of Act /a everywhere zero) under both the Webb et 
al. team's nominal error model and a selection of more skeptical alternatives. 
In Section 2 we summarize the contents of the quasar dataset, discuss the ex- 
plained and unexplained sources of uncertainty in these measurements, and 
detail the functional form of the proposed monopole+r(z)-dipole. We then 
motivate our exploration of non-Normal and biased error terms through an 
inspection of the residuals about the best-fit dipole solution from King et al. 
(2012) and a comparison of Aa/a estimates for "twice-observed" absorbers. 
In Section 3 we propose data-driven error models for the quasar dataset 

See, for example, the 23 October 2010 issue of New Scientist magazine (#2783), or 
the 2 September 2010 issue of The Economist magazine. 
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and specify priors on their controlling parameters. We then specify priors 
on the fundamental parameters of the competing hypotheses, and explore 
the resulting posterior probability densities through Markov Chain Monte 
Carlo analysis. In Section 4 we compute marginal likelihoods for each model 
via nested importance sampling and the method of power posteriors, and 
examine the sensitivity of the corresponding Bayes factors to our priors. 
Finally, in Section 5 we discuss the implications of these results in terms 
of the evidence, or lack thereof, for a spatial variation in the fine structure 
constant, and the prospects for future observations to settle this debate. 

2. The Quasar Dataset. In this study we examine the publicly-available 
quasar dataset of Webb et al. (2011): a single catalog listing for each of the 
295 intervening absorbers (141 Keck plus 154 VLT) identified along the 131 
quasar sightlines probed to-date their MM-based estimates of z a b s , Aa/«, 
and cr bs; the last a standard deviation characterizing the uncertainty in 
each Aa/a estimate arising from explained sources of observational error. 
Also listed are the J2000 identifier (encoding right ascension and declina- 
tion in the equatorial coordinate system) and redshift, z qua , of each back- 
ground quasar, the telescope on which the corresponding observation was 
made, and the e ran group to which that observation has been assigned for 
error analysis — namely, VLT, Keck low contrast (LC), or Keck high contrast 
(HC). An additional flag highlights two "suspect" absorbers in this catalog — 
the z abs = 1-542 system toward J00048-415728 and the 

z abs — 2.84 system 

toward J194454+770552. These objects were omitted from the Webb et al. 
team's most recent work, and so for consistency are omitted from our reanal- 
ysis as well. The quasar dataset thus described is available in electronic form 
from Professor Michael Murphy's homepage at the Swinburne University of 
Technology. 

2.1. Explained Errors. An astronomical spectrograph, such as HIRES on 
Keck or UVES on the VLT, uses a high-resolution diffraction grating to iso- 
late distinct wavelength components of the incoming light, projecting these 
onto a charge-coupled device (CCD) for digital intensity measurement. The 
capture of a raw astronomical spectrum thus constitutes a photon counting 
experiment in which the count in each pixel of the CCD represents a sin- 
gle draw from a Poisson distribution with mean value set by the integrated 
flux intensity of light within a narrow wavelength interval from the observed 
source plus a contaminating contribution from the background sky and even 
possibly other non-target emission (e.g. stray light from other bright sources 
in the field of view); all modulated (i.e., non- linearly transformed) by the in- 
herent response characteristic of the telescope and device. In the Webb et al. 
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team's analysis multiple such raw spectra for each quasar were sourced from 
the Keck and VLT archives and reduced (i.e., combined and "cleaned") to 
single sequences of (high quality) absorption line profiles ready for modeling. 
The reduction process necessarily involves a number of complex transforma- 
tions of the raw data (including "stacking" , "flat fielding" , "bias frame sub- 
traction" , "wavelength calibration" , "cosmic ray removal" , and "continuum 
subtraction" ; the essential cleaning and calibration procedures of precision 
CCD photometry), which ultimately correlate to some extent both nearby 
and not-so-nearby pixels in the output profile sequence. However, given the 
almost overwhelming complexity of accounting robustly for each stage of the 
data reduction pipeline the extent of this correlation is assumed negligible — 
as is standard practice in the field 6 — with each pixel in the reduced spectrum 
ultimately treated as independent of all others and assigned an observational 
standard deviation, <7j. 7 

As noted earlier the MM method for deriving from the reduced spectrum 
a Aa/a estimate for each intervening absorber involves a complex multi- 
parameter fit across its signature absorption lines. Due to both natural (col- 
lisional) and Doppler (thermal and/or turbulent) broadening, the imprint 
of each line will be typically spread out over multiple resolution elements 
and a profile function thus required to characterize its shape and recover 
its centroid (i.e., the precise redshift of that component). The Voigt profile 
function 8 — characterized by the three controlling parameters: redshift, col- 
umn density, and velocity width; with the latter constrained jointly (as the 
kinetic temperature) across all species detected in the thermal broadening 
case and independent in the turbulent case — is well-established as the nat- 
ural basis for this procedure. However, the number of such bases required 
to accurately model a particular system, itself a function of the number of 
distinct "velocity components" present in the intervening cloud, is of course 
unknown a priori. The Webb et al. team therefore elected to fit each absorber 
sequentially, starting with a single Voigt profile function for each line and 

Substantial research efforts are, however, underway to develop Bayesian techniques for 
modeling more faithfully the complete observational reduction process; see, for example, 
the recent thesis on this topic by Bosch (2011). 

7 For further (highly technical) details of this reduction process and the rules applied 
for "error bar propagation" therein the interested reader may refer, for instance, to the 
official UVES data reduction cookbook available online. 

8 The Voigt profile function (Kendall, 1938; Nason, 2006) corresponds in shape to the 
probability density of a continuous random variable defined as the sum of two indepen- 
dent random variables of standard forms — a Normal and a Cauchy. For a more detailed 
description of the use of the Voigt profile function in the modeling of astronomical spectra 
the interested reader may refer to (in particular, Appendix A of) Michael Murphy's PhD 
thesis also available online. 
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adding components until a "statistically acceptable" solution was achieved 
(King et al., 2012); with the benchmark for acceptability characterized by 
a l/of -weighted sum of squared residuals (WSSR) approximately equal to 
the number of pixels fit minus the number of free parameters (dubbed, "the 
degrees of freedom", in their analysis though the fitted model is in fact 
non- linear). The known limitations of the WSSR as a stopping criterion for 
model selection (cf. Mallows 1995; Andrae, Schulze-Hartung and Melchior 
2010) may well contribute to the unexplained errors of the quasar dataset 
discussed below. 

Of those Voigt profile models (both thermal and turbulent) tested up to 
this maximum number of components the model with minimum Akaike In- 
formation Criterion (corrected for small n) was then selected as the best 
choice for refitting with Aa/a (now) a free parameter. Having located 
(via a downhill gradient-type search) the WSSR-minimizing (likelihood- 
maximizing) solution to this final model the Webb et al. team estimate 
the magnitude of their explained error term from the local curvature of the 
likelihood surface in Aa/a; adopting the Normal approximation, e b s ~ 
A^(0, o"obs)- ^he va hdfty of this approximation to the shape of the likelihood 
surface in the stated context has been confirmed via Markov Chain Monte 
Carlo (MCMC) exploration, as described by King et al. (2010). 

2.2. Unexplained Errors. As mentioned in the Introduction this estimate 
of the uncertainty in Aa/a derived during profile fitting and governed by 
the (photon counting) realization noise of the observed spectrum has been 
declared insufficient by the Webb et al. team to explain the apparent noise 
level in the quasar dataset. Numerous possible reasons for this have been 
proposed and discussed at length by Murphy, Webb and Flambaum (2003) 
and King et al. (2012). These may be divided broadly into two classes: (i) 
methodological limitations of the fitting procedure rendering inadequate the 
adopted e b s distribution, and (il) hitherto unmodeled sources of additional 
random and/or systematic error. From the first class we note the neglected 
impact of structural uncertainty in the profile fit with regard to the choice of 
line broadening mechanism and number of velocity components to include; 
with perhaps a tendency toward underestimation of the latter owing to the 
known pitfalls of the WSSR stopping criterion adopted. From the second 
class we have the chance blending of lines from absorbers at different red- 
shifts, and/or run-to-run errors in the wavelength calibration solution (King 
et al., 2012). Dedicated investigations of the latter by Griest et al. (2010) and 
Whitmore, Murphy and Griest (2010) have exposed the presence of intra- 
order velocity shifts against iodine cell reference spectra imaged on both the 
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HIRES and UVES spectrographs of sufficient magnitude to perhaps impact 
on estimates of the fine structure constant; see also the novel asteroid-based 
calibration study of Molaro et al. (2008). 

To account for the inadequacy of e b s in their analysis the Webb et al. 
team suppose the existence of a Normally-distributed error term, e ran ~ 
A^(0, of an ), adding strictly unbiased (zero mean and mode) noise to their 
Aa/a estimates (Webb et al., 2011; King et al., 2012; Berengut, Kava and 
Flambaum, 2012). For each of their monopole and dipole hypotheses tested 
against the quasar dataset the authors constrain the magnitude (viz. stan- 
dard deviation) of this error term, o" ran , through a Least Trimmed Squares 
(LTS) procedure (Rousseeuw, 1984) in which both its estimate and the best- 
fitting set of hypothesis parameters (under a WSSR-based likelihood func- 
tion including this additional error term) are refined in an iterative manner. 
For the specific monopole+r(2:)-dipole hypothesis examined herein (Equa- 
tion 2 below) the authors estimate <7 ran « 0.858 x 10" 5 for their VLT sub- 
sample (or e ran group), cr ran « 1.630 x 10 -5 for the sub-sample of their 
Keck absorbers with high contrast observations (the Keck HC e ran group), 
and <7 ran ~ for the remainder of their Keck absorbers with low contrast 
observations (the Keck LC e ran group). 

Hence, under the Webb et al. team's final model for both their explained 
and unexplained uncertainties the total error in each Aa/a estimate is 
treated as e to t ~ -^"(O^tot) where of ot = a% bs + of an (with <7 obs unique 
to that absorber and <7 ran shared across its e ran group). After outlining for 
reference the mathematical form of the monopole+r(z)-dipole hypothesis 
below we proceed to examine the <7 bs- and <7 to t-scaled residuals about its 
best-fit solution from the King et al. (2012) study as a means of evaluating 
(qualitatively) this simple error treatment. 

2.3. Mathematical Form of the Dipole. The monopole+r(z)-dipole hy- 
pothesis considered by King et al. (2012) and Berengut, Kava and Flambaum 
(2012) takes the functional form: 

(2) Aa/a mod \ x . }0m =m + B x r(zi) x cos(0)[rai, decj, ra d , dec d ] 

with Xi = {raj, decj, r(zj)} the vector of explanatory variables for the ith 
absorber, G m = {m, B, i&d, dec,i} a vector of input model parameters, and 
cos((/>)[-] a function returning the cosine of angular separation between the 
observational sightline and dipole vector. Given right ascension in hours and 
declination in degrees the latter may be computed as: 

(3) cos(^) = sin(decj) sin(decd) + cos(decj) cos(decd) cos(15 x [ra^ — raj). 
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The lookback distance (cf. Hogg 1999) for each absorber, r(zj), under the 
standard cosmological model (here we adopt Qm = 0.3, J7a = 0.7, and 
Hq = 70 km s _1 Mpc -1 ) is given (in units of GLyr; Giga-Light-years) by: 



o 



(4) r(zi) = 13.98 x / (1 + zf)y/0.3 x (1 + z'f + 0.7 



-i 



dz' . 



For reference, we note that the monopole-only null hypothesis corresponds 
simply to the case of Equation 2 with B = — that is, Aa / a mc ,d|0 m ={m} = m 
(a non-zero constant); with Aa/o! mo d|0 m ={ m =o} = its strict null counter- 
part. 

2.4. Possible Non-Normality of Residuals. In the top row of Figure 3 we 
present histograms of the residuals about the best-fit (i.e., 1/ofot WSSR- 
minimizing) solution to the monopole+r(z)-dipole hypothesis reported by 
King et al. (2012), scaled by the magnitude of each absorber's raw obser- 
vational error term, cr b s , and subdivided by telescope (or, more precisely, 
£ran group) and redshift. To limit any sensitivity to the precise functional 
form for the dipole hypothesis assumed herein we include only those sys- 
tems lying within ±10° of its equator where the contribution of the dipole 
component is smaller than or equal to the magnitude of the fitted monopole 
component (\m\ = 0.187 x 10 -5 ). If only the explained uncertainties (cf. 
Section 2.1) represented by £ Q b s ~ J\f(0,a^ hs ) were in operation here (and 
the monopole+r(,z)-dipole hypothesis both true and well fit), then the dis- 
tributions of these scaled residuals should, of course, themselves be standard 
Normal. However, this seems unlikely to be the case (as acknowledged by 
the Webb et al. team's invocation of the unexplained error term) for ei- 
ther the VLT residuals — which exhibit an unusual bimodality — or the Keck 
residuals — which exhibit a strong negative skewness. These departures from 
the standard Normal are further emphasized by the Q-Q plots shown in the 
top-right inset of each panel. 

As described in Section 2.2, under the Webb et al. team's model for the 
unexplained uncertainties in the quasar dataset the total error on each ob- 
servation remains zero mean Normal, £t t ~ A/"(0, cr^ ot ), but takes a broader 
standard deviation (at least in the case of the VLT and Keck HC e ran groups) 
of 0"tot = (<7Q bs + ofan) 1 ^ 2 - Rescaling the VLT and Keck HC residuals ac- 
cordingly we recover those histograms shown in the bottom row of Figure 3. 
While the scaled residuals now bear a greater similarity to the standard Nor- 
mal, from a critical perspective the rough agreement seen here could hardly 
be considered reassuring, especially given the lack of any strong justification 
for the supposed form of £tot- By including here a number of plausible alter- 
natives to the Webb et al. team's nominal error model in our reanalysis of 
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Fig 3. Motivation for our consideration of non-Normal error terms. Each panel contains 
a histogram of the residuals, [Aa/a — Aa/a mo d], with respect to the best-fit parameteriza- 
tion of the monopole+r(z)-dipole hypothesis of King et al. (2012), scaled by the standard 
deviation of the raw observational error term only in the top row, a bs, and that of the 
total error term adopted by Webb et al., (o~o bB + ofan) , in the bottom row. To limit any 
sensitivity to the precise functional form adopted for the dipole we include only those sys- 
tems lying within ±10° of its equator (VLT sources on the left; Keck on the right), and 
we further subdivide each sample by redshift at z = 1.5. We draw the reader's attention to 
departures in a number of these histograms from the standard Normal form that would be 
expected if the measurement errors were governed entirely by Normal terms of the magni- 
tude assumed in each row; in particular, the bimodal split about zero in the VLT residuals, 
and the negative skewness in the Keck residuals. We further emphasise these departures 
from Normality via the Q-Q (residual vs. Normal) plots shown as insets in the top-right 
corner of each panel; the range here (not marked) is exactly -3 to 3 on each axis. 



the quasar dataset we are able to gauge the influence of the assumed form 
for the unexplained errors on the apparent dipole significance (cf. Sections 
3 to 5). 

2.5. Twice- Observed Absorbers. As noted earlier although the Keck and 
VLT telescopes cover generally well-separated regions of sky — the Hawaii- 
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based Keck being largely restricted to targets in the northern hemisphere, 
and the Chile-based VLT restricted to targets in the southern hemisphere — 
there nevertheless appears in the Webb et al. dataset an eight quasar in- 
tersection of sightlines common to both (evident in Figure 2 as the star- 
shaped symbols arising from the overlap of the up-triangle and down-triangle 
symbols used respectively for the VLT and Keck datapoints). Among the 
corresponding population of intervening absorbers thus probed there are 
eleven twice-observed; that is, detected at matched spectroscopic redshifts 
in the spectra from both instruments (which feature quite different observa- 
tional selection functions — owing to the unique resolution and wavelength 
coverages of each — and hence do not necessarily return identical sets of ab- 
sorbers). The pairwise comparison of Keck and VLT Aa/a estimates for 
these twice-observed systems offers a "first-order" test of the unbiased error 
hypothesis questioned by Chad Orzel and others skeptical of the Webb et 
al. team's conclusions. 

The eleven scaled differences, [Aa/aKcck-Aa/avLT]/(o"tot,Kcck+ (T tot,VLT) 1 
so recovered form the ordered list: {—1.64, —1.40, —0.97, —0.75, —0.56, —0.52, 
- 0.34, -0.09, 0.17, 0.27, 0.30} (rounded to the 2nd decimal place). The pre- 
dominance of negative differences observed here (eight of the eleven) hints at 
the presence of systematic biases in the Aa/a values recovered from one or 
both of these telescopes; and indeed a Wilcoxon signed-rank test (Wilcoxon, 
1945) allows rejection of the zero mean difference hypothesis at the 5% sig- 
nificance level (p = 0.017). Though admittedly not of overwhelming power 
due to the small sample size available this result nevertheless serves as fur- 
ther motivation (in addition to the alignment of the alleged dipole's equator 
along the coverage overlap of the Keck and VLT telescopes) for the biased 
error models we consider in our Bayesian reanalysis of the quasar dataset 
below. 

3. Generative Models & Posteriors. Having established the impor- 
tance of a Bayesian reanalysis of the quasar dataset, and for exploring in 
this context both unbiased and biased error terms, in the Introduction and 
Section 2, respectively, we now proceed to outline formally the generative 
models and priors required for this endeavor. 

3.1. Error Models. In the interests of thoroughness we explore four al- 
ternative forms for the total error term operating in each £ ran group; each 
candidate form allowing a quite different distribution for the unexplained 
uncertainty component. For reference we denote these, A, B, C, and D, 
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respectively, and define as follows: 

(A) £ to t|/3ran, ^ran ~ A/"(/3 ran , ^obs + °fan)> 

(B) e t ot|/3ran,o" ran ~ V(/3 ran , a ohs , (7 ran ) [V denotes the Voigt distribution], 

(C) etotlC ran ; ^ran > Wran ~ A/" s kcw(6an, V ran W ran /yJ (7^ + W^ an , a^ hs + W^ an ), 

A/"(-7ranO-obs,o-obs) witn probability r/ ran , 
A/"(7ranO-obs,o-obs) otherwise. 



(D) £tot|??r 



Error form A corresponds, of course, to the scenario of a Normally- 
distributed source of measurement noise, £ran ~ A/"(/3ran) ^"ran)' Operating 
in addition to, and independently of, the explained error, £ Q b s ~ AA(0, <7 2 bs ). 
At /3 ran = this equates to the strictly unbiased (i.e., zero mean and mode) 
error term adopted by King et al. (2012); and we therefore consider this lim- 
iting case the nominal error form (Ao) for our analysis, treating the more 
general /3 ran ^ case as the default skeptical form. 

Error form B, on the other hand, corresponds to the (undesirable) scenario 
of a heavy-tailed source of Cauchy-distributed measurement noise with lo- 
cation, /3 ran , and scale, a ran , i.e., £ ran ~ C(/3 ran , cr ran ) where /c|/3 ran , ( 7 ran ( 2; ) = 
1/-7T x <7 ran / (of an + (x — /3 ran ) 2 ), operating in addition to, and independently 
of, the explained error. As noted in our earlier discussion of absorption line 
modeling (Section 2.1), the Voigt profile function (cf. Kendall 1938; Nason 
2006), V, provides an analytical expression for the density of such a distri- 
bution arising from the convolution of a Cauchy and a Normal. Namely, 



Re[u}(z)] _ (x - /3 ran ) + i o-y 



iic \^ HvanJ T '"ran 
/v|£ ran ,<x obs ,a raI >) = J==, Z = j= , -OO < X < OO, 



where uj(-) in the above denotes the complex error (or "Faddeeva") func- 
tion. By allowing, through error form B, for the possibility of heavy-tailed 
behaviour in the unexplained error term we help to ensure the robustness of 
our conclusions under the Normal error form (A) against the possible pres- 
ence of extreme outliers in the quasar dataset. Though the mean remains 
undefined for the Cauchy, and thus the Voigt as well, we note that /3 ran = 
nevertheless gives a zero mode case (Bo), which we refer to as "unbiased" 
(remembering in our subsequent discussion that this is not strictly so) and 
which we examine separately to the more general (skeptical) /3 rari ^ case. 

Error form C supposes instead a source of skew Normal noise, £ran rs " / 
A/"skew(£ran 5 ^ran, ^r ari )) operating in addition to, and independently of, the 
explained error. A tendency toward bias in the wavelength calibration solu- 
tion for a given £ran group may be one explanation for such an dsyuiTYictvic 
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distribution in the uncertainties of the quasar dataset. We note here the 
definition of the skew Normal with normalized shape parameter, v, scale, w, 
and location parameter, £, in terms of the standard Normal density, </>(•), 
and distribution function, $(•), for reference: 

2 fx — £\ / v(x — £) \ 
(5) f \f \f n ,„2(x) = —(f) I ) 3> I . I , — oo < x < oo, 

with — 1 < v < 1 and w > 0. We note also that the sum of a (zero 
mean) Normal, A/"(0, of an ), and a skew Normal, A/" s kew(£ran) ^ram w ran)i * s 
again skew Normal, with parameters as specified in our definition of error 
form C above. This may be verified, of course, by reference to the char- 
acteristic function of the skew Normal distribution; according to Pewsey 
(2000), V/^&v,**)® = exp(i# - wH 2 /2){\ + ir(vwt)} where t(x) = 

y/2/Tr exp(u 2 /2)du for x > and r(— x) = —t(x). We consider once 
again a so-called "unbiased" (here zero mode, but not mean) case (Co), 
constructed by solving numerically for £ mo d e for each a b s and {^ran, i^ran} 
parameter pairing, in addition to the skeptical case of free £ ran . 

Last of all, error form D corresponds to a scenario of systematic mis- 
estimation in which Aa/a is alternately under-shot or over-shot by 7 ran x 
o" bs, with probability r/ ran of the former (and 1 — r] ra _ n of the latter). Such 
a bimodal error distribution might well arise as the result of routine mis- 
identification of the dominant line broadening mechanism (turbulent or ther- 
mal) or the number of velocity components present when modeling the ob- 
served absorption profile (cf. Section 2.1); and was suggested by the distri- 
bution of residuals around the equator of the proposed dipole for the VLT 
sample (examined in Section 2.4). In the interests of thoroughness, though 
perhaps not terminological consistency, we also consider an "unbiased" (here 
zero mean, but not mode) case of this error form (Do) with fixed rj = 0.5. 

In Figure 4 we illustrate the nature and diversity of all four candidate 
forms (A, B, C, and D) for the total error term, £t t) under a variety of 
arbitrary choices for their governing parameters. As in earlier studies by the 
Webb et al. team, each of the three £ ran groups of the quasar dataset — 
namely, the VLT sample, the low contrast (LC) Keck sample, and the high 
contrast (HC) Keck sample — is treated separately here for error analysis 
and assigned one of the above. Throughout we employ the notation, AAA, 
BoBoBo, CAD and so on, to indicate which error form has been assigned 
to each — Keck LC, Keck HC, and VLT, in that order. We refer specifically 
to the combination, AAA, as our default skeptical model and to the com- 
bination, AoAoAo, as the nominal error model. With each candidate form 
admitting two or three governing parameters we must therefore specify a 
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Fig 4. Illustration of the four candidate forms (A, B, C, and D ) proposed here for mod- 
eling the total error term, £tot, of the quasar dataset. We specify a standard deviation, 
o"obs = 1 x 10 -6 , for the explained uncertainty component, e bs ~ A/"(0, c^ bB ), in these 
plots, and we indicate the diversity of densities thereby admitted under each corresponding 
total error term by tracing the ouptut for a variety of arbitrary inputs to its governing pa- 
rameters (detailed in the top-right legend of each panel). For clarity, the "unbiased" cases 
shown of these forms (Aq, Bo, Co, and Do) are clearly distinguished (via a reduced line 
thickness) from their general (skeptical) counterparts. 



total of between six and nine priors for a complete error model. 

3.2. Error Model Priors. In constructing priors on the governing param- 
eters of our error models we are guided by the logical observation that the 
magnitude of the unexplained error term must be of roughly the same or- 
der as the explained error term in this dataset (cr bs ~ 2 x 10 -5 ); if it were 
very much smaller it would presumably have gone unnoticed, while if it were 
very much larger the Webb et al. team would have been unlikely to have 
proceeded to hypothesis testing without a better understanding of its ori- 
gin. The same follows from inspection of the o"ran~ 

scaled residuals along the 
equatorial strip shown in Figure 3. In addition, when using biased forms for 
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£tot (cf. Section 3.1 above) we specify a priori that any bias in the Keck 
(LC and HC) error terms must be negative and any bias in the VLT er- 
ror term positive as per the skeptical interpretation of the quasar dataset. 
[Note that we also restrict the prior range of the dipole vector to southern 
hemisphere pointings in the biased error case in accordance with the rival 
hypothesis — that there remains a similarly oriented dipole field driving the 
apparent North-South trend in Aa/a; cf. Section 3.3 below.] Thus, for the 
parameters, /3 ran and cr ran , in our default skeptical error form (A), which 
govern the mean/mode and standard deviation, respectively, of the unex- 
plained term in this case, we suppose vr(/3 ran ) ~ A/"h a if(0, [0.5 x 10~ 5 ] 2 ) and 
7r(crran) ~ U(0, 5 x 10" 5 )— with the half Normal prior on /3 ran positive for 
the VLT e ran group and negative for the Keck LC and HC e ran groups. 

To promote fairness in the marginal likelihood-based comparison of each 
model-hypothesis pairing we have attempted to make our priors on the 
parameters of the remaining three error forms as similar as possible to those 
of form A. Thus, for error form B we suppose 7r(/3 ran ) ~ A/"h a if(0, [0.5 x 
10~ 5 ] 2 ) for the mode of the underlying Cauchy (with the same choice of 
sign for each e ran group), but allow a (slightly) larger range of 7r(<7 ran ) ~ 
U(0, yf 2 log(2) x 5 x 10~ 5 ) on its scale parameter to match our effective prior 
on the full width at half maximum (FWHM) of this error form to that of 
A (the standard deviation being undefined for the Cauchy, and thus not 
available for comparison). For error form C we begin by setting our prior 
on the unnormalized skew parameter, v Tan /\/l — "y 2 an , of this distribution 
to a generous, AA(0, [10] 2 ); which transforms to the following density on the 
normalized skew parameter, i> ran , with a = 10: 



fa) = /^72 exp 



-x 2 



2a 2 (l - x 



2) 



(l_ x 2)3/2' 



1 < X < 1. 



We then match our prior on the standard deviation of the unexplained error 
term in C to that of A by setting 7r('u; ran |v ran ) ~ U(0, (1 — ^ v 2 an ) -1 ^ 2 x 
5 x 10~ 5 ); and likewise to match our priors on the corresponding means 

we choose, 7r(^ ran |« ran , w ian ) ~ jV ha if (-v Tajs w iajl ^/f , [0.5 x 10~ 5 ] 2 ) signed as 
above. Finally, for error form D, for which no precise matching of this 
nature is possible, we simply adopt 7r(7 ran ) ~ U(0, 2.5) and n(r] 

ran ) ^ 

Beta half (15, 15) (i.e., "folded" about 0.5 [/^ <0 ,or/* 
with ?/ ran > 0.5 assumed for the Keck e ran groups and ?? ran < 0.5 for the 
VLT e ran group), favoring a near-symmetric £ ran over a markedly asymmet- 
ric one. For the typical absorber with explained error, <7 b s ~ 2 x 10 -5 , this 
gives a prior expectation on the bias of the unexplained error term in each 
£ran group comparable to that of our form A benchmark. 
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3.3. Hypothesis Priors. We apply similar principles with regard to con- 
structing priors on the input parameters, = {to, B, ra^, dec^}, of the Webb 
et al. team's proposed monopole+r(2:j)-dipole hypothesis for the apparent 
cosmic variation of Aa/ a (see Equation 2 in Section 2.3). For the strength 
of the monopole term, to, we adopt a zero mean Normal prior with standard 
deviation, 0.5 x 10~ 5 , and for the strength of the dipole term, B, an expo- 
nential prior with rate, l/[0.5 x 1CT 5 ]. That is, vr(m) ~ W(0, [0.5 x 10 -5 ] 2 ) 
and tt(B) ~ Exp(l/[0.5 x 10" 5 ]). Supposing a priori for all "unbiased" error 
models that the dipole vector might point anywhere on the celestial sphere 
with equal probability we assign the appropriate uniform priors to both ra^ 
and the sine of dec^. Thus, vr(rarf) ~ U(0, 24) and 7r(sin(dec c ;)) ~ U(— 1, 1), 
i.e., 7r(dec,i) ~ cos(dec,i)/2. While for the general (skeptical) case of each 
model we restrict the dipole declination to the southern hemisphere (i.e., 
decrf < with 7r(decd) ~ cosdec^) as noted earlier. 

For consistency with the above we also suppose 7r(m) ~ AA(0, [0.5 x 10~ 5 ] 2 ) 
for the strength of the monopole in the monopole-only version of the null 
hypothesis; recall though, our strict null hypothesis is that Aa/a remains 
everywhere zero. 

3.4. Likelihood Function. With any stochastic variation in the Aa/a 
estimates of the quasar dataset assumed to arise solely from the combination, 
£ to t> of the explained and unexplained error terms described by the error 
models of Section 3.1 — and with the realization of this measurement error 
for any given absorber assumed independent of all others — the likelihood 
function for the observed data, y = {Aa/ai : i = 1, . . . , 293} with covariates 
represented as Xi = {ra^, dec,, r(zj)}, corresponding to a given Aa/a mo & and 
e tot pairing, M, may be written as: 

293 

(6) L(y\6 = {0 m ,0 e },M,Xi) = ]J A tot |o c , £ran (Aa/a, - Aa/ a mod \ Xi ^ m ). 

i=l 

Here 6 m = {m, B, ra^, dec^} (dipole) [or {m} (monopole null), or {m = 0} 
(strict null)] denotes the set of input hypothesis parameters, fc tot \e e ^ I!ini 
the etot probability density for the particular e ran group to which the ith 
absorber belongs, and 6 e the vector of between six and nine parameters 
required for a complete error model. 

3.5. Posteriors. For each combination of error model tested here (see 
Section 4 below for a complete enumeration) and hypothesis (strict null, 
monopole null, and dipole) we have constructed a Markov Chain Monte 
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Carlo sample of 100,000 draws from the posterior using a symmetric per- 
turbation, # pr0 p ~ A/"(0j, S prop ), for the proposal. The diagonal variance 
matrix, Sl pr op, of which was initialized to our prior variance on each param- 
eter and progressively rescaled toward a target acceptance rate of 0.4 during 
a 5,000 draw burn-in phase. For each final output sample the corresponding 
autocorrelation function has been computed to verify convergence and an 
effective sample size of > 10,000. Although the marginal likelihoods of each 
model-hypothesis pairing — which we derive only indirectly from these poste- 
riors (see Section 4 below) — are in fact the sole quantities of interest for our 
Bayesian assessment of the claimed spatial variation in the quasar dataset 
we nevertheless review briefly here the nature of our posterior inferences for 
key error model and hypothesis parameters as an aid to understanding the 
ensuing results. 

In Figures 5 and 6 we illustrate the marginal posterior probability densi- 
ties of those parameters (or combinations thereof) governing the width (viz. 
standard deviation, or half width half maximum as appropriate in the case 
of model B) resulting from each error form (A/Ao to D/Do, as described 
in Section 3.1) applied homogeneously across all £ ran groups under both the 
strict null and dipole hypotheses. (Our posteriors under the monopole-only 
null hypothesis, and for the two non-homogeneous error model combinations 
we test, CAA and CAD [cf. Section 4 below], being roughly intermediate 
to these representive examples.) Although the inferred width of the unex- 
plained error term for each e ran group varies somewhat depending upon the 
error model adopted, the relative ordering of the aforesaid — Keck LC, VLT, 
Keck HC (from smallest to largest) — does not. For the "unbiased" case of 
each model in particular (Figure 6: AoAoAo to DoDoDo) we note that the 
error widths recovered under the null hypothesis are generally somewhat 
greater than those recovered under the (more flexible) dipole hypothesis — 
that is, the proposed error forms must necessarily "stretch" to encompass 
the broader spread of residuals in the former. Interestingly, the Bayesian 
analysis we present here for the nominal error model (AoAoAo) favors a 
non-zero <7 ran of roughly 0.6xl0~ 5 (posterior mean) for the Keck LC e ran 
group under the strict null hypothesis, in constrast to the cr ran w Least 
Trimmed Squares (LTS) estimate reported by King et al. (2012) in their 
Table 2. This discrepancy presumably arises from the deliberate focus on 
interquartile width at the expense of (or, purportedly in robustness against) 
any marked outliers in the latter approach. Our estimates for the other e ran 
groups under both the null and dipole hypotheses are nevertheless in fair 
agreement with their LTS counterparts. 

In Figure 7 we present the marginal posterior probability densities of the 
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Fig 5. Posterior probability densities for the key parameters (or combinations thereof) 
governing the underlying error distribution width (viz. standard deviation, or half width half 
maximum as appropriate in the BBB case) of the e ran component in each of our general 
(skeptical) error models: AAA to DDD (homogeneous) . In the interests of illustrative 
clarity only those posteriors corresponding to the strict null (i.e., Act /a everywhere zero) 
and dipole hypotheses are compared here (our results for the monopole-only null hypothesis 
being roughly intermediate to these two). Interestingly, although the inferred width of the 
unexplained error term for each £ ran group varies somewhat depending upon the error 
model adopted, the relative ordering of the aforesaid — Keck LC, VLT, Keck HC (from 
smallest to largest) — does not. 



parameters (or combinations thereof) governing the bias (viz. offset of the 
mode, or in case D the mean, from zero) in each of our general (skeptical) 
error models (applied homogenously across all £ ran groups) under both the 
strict null and dipole hypotheses. As explained in Section 3.2 the posteriors 
for the Keck (LC and HC) and VLT 

£ran groups are ascribed priors restrict- 
ing the sign of bias to negative in the former and positive in the latter in 
line with the skeptical interpretation of the quasar dataset; hence the solid 
black divides marked at zero on this Figure. The degree of bias favored by 
these posteriors for each £ ran group varies little between error models but 
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Fig 6. Posterior probability densities for the key parameters (or combinations thereof) gov- 
erning the underlying error distribution width (viz. standard deviation, or half width half 
maximum as appropriate in the BoBoBo case) of the £ ran component in each of our "un- 
biased" error models: AoAoAo to DoDoDo (homogeneous). Again, although the inferred 
width of the unexplained error term for each £ ran group varies somewhat depending upon 
the error model — and hypothesis — adopted, the relative ordering of the aforesaid — Keck 
LC, VLT, Keck HC (from smallest to largest) — does not. As expected the error widths re- 
covered under the null hypothesis here are generally somewhat greater than those recovered 
under the (more flexible) dipole hypothesis. 



greatly between hypotheses — as expected, larger biases are generally pre- 
ferred under the strict null hypothesis than under the dipole hypothesis, 
owing to the inherent (though of course only partial) degeneracy between 
the North-South dipole and opposing bias scenarios. 

Finally, for reference we plot in Figure 8 the joint posterior densities of the 
key parameter pairings under the monopole+r(z)-dipole hypothesis — viz. 
monopole strength-dipole strength (m—B) and right ascension-declination 
of the dipole vector (ra^-decd) — for error models AAA, AoAoAo, CCC, 
and CqCoCq. While the posteriors corresponding to each error model in 
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Fig 7. Posterior probability densities for the key parameters (or combinations thereof) 
governing the error distribution bias (viz. offset of the mode, or in case D the mean, from 
zero) in each of the biased error models: AAA to DDD (homogeneous). Once again in 
the interests of illustrative clarity only those posteriors corresponding to our strict null 
hypothesis and dipole model are compared here. As explained in Section 3.2 the Keck (LC 
and HC) and VLT £ ran groups are ascribed priors restricting the sign of their bias to 
negative in the former and positive in the latter in line with the skeptical interpretation 
of the quasar dataset; hence the solid black divides marked here at zero. The degree of 
bias favored by the results for each £ ra n group varies little between error models but greatly 
between hypotheses — as expected, much larger biases are inferred under the strict null 
hypothesis than under the dipole hypothesis, owing to the inherent (though of course only 
partial) degeneracy between the North-South dipole and opposing bias scenarios. 



the "unbiased" case are both highly concentrated around the maximum 
likelihood solutions identified by the Webb et al. team under their fitting of 
the nominal error model (Ao Ao Ao) — namely, {ra^ 17.5 hours, dec^ « —60 
degrees} and {m ~ —0.2 x 10 -5 , B ~ 10 -6 } — in the biased (skeptical) case 
the corresponding posteriors are evidently more diffuse and favor markedly 
a smaller dipole strength. 

To discriminate between this plethora of plausible generative models with 
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Fig 8. Posterior probability densities for the four key parameters, 8 m = {m, B, ra^, deed}, 
governing the Webb et al. team's proposed monopole+r(z)-dipole hypothesis for the appar- 
ent spatial variation of Ace/ a in the fine structure dataset under the following alternative, 
example forms of the unexplained error term: AAA and AoAoAo (top row); and CCC 
and CoCoCo (bottom row). We illustrate each posterior via a series of smoothed contours 
(surrounding the posterior mode and) enclosing increasing fractions of the (marginal) pos- 
terior mass from 0.05 to 0.95 in both the m-B plane (giving the strengths of the monopole 
and dipole components, respectively) and the v&d-decd plane (giving the dipole direction 
on the celestial sphere). While the posteriors under each error model in the "unbiased" 
case are both highly concentrated around the maximum likelihood solutions identified by 
the Webb et al. team under their fitting of the nominal error model (A.oAoAo,) — namely, 
{rad ~ 17.5 hours, deed ~ —60 degrees} and {m « —0.2 x 10~ 5 , B ~ 10 -6 } — in the biased 
(skeptical) case the corresponding posteriors are evidently more diffuse and favor markedly 
a smaller dipole strength. 



quasi-degenerate parameters — and to thereby identify the strongest candi- 
date (s) for explaining the apparent spatial variation of the quasar dataset — 
we now proceed to the Bayesian comparison of marginal likelihoods (cf. 
Carlin and Chib 1995; Kass and Raftery 1995). 
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4. Marginal Likelihoods. For each combination of proposed error 
model and hypothesis we compute the corresponding marginal likelihood 
via both nested importance sampling (Chopin and Robert, 2010) and the 
method of power posteriors (Friel and Pettitt, 2008) — the former allowing 
the sensitivity of the resulting Bayes factors to our chosen priors to be rapidly 
assessed, and the latter providing an important verification of our evidence 
computations. 

4.1. Nested Importance Sampling. The marginal likelihood, Z, of the 
observed data, y, under a given model, M, is (most generally) defined as 
the integral of the corresponding likelihood function, L(y\9,M), over the 
prior density, ir(0\M)dO; thus, 



With 9 typically of high dimension, and the bulk of the likelihood concen- 
trated over a small volume of the prior domain, in practical applications 
one cannot hope for an accurate evaluation of Z via direct numerical in- 
tegration (viz. quadrature); so computational MC methods are most com- 
monly relied upon for this task (see, e.g., Friel and Wyse 2012 for a re- 
cent overview). Of these MC methods, nested sampling (Skilling, 2006), is 
perhaps the most popular among the physics and cosmology community 
at present (e.g. Mukherjee, Parkinson and Liddle 2006; Feroz, Hobson and 
Bridges 2009), owing to its surprising efficiency for cases in which the prior 
can be analytically transformed to an n-dimensional Uniform density. 9 

Nested sampling aims to estimate the marginal likelihood via a trans- 
formation of the variable of integration in Equation 7 to the prior mass 
cumulant differential, dX, defined 



JL(y\0,M)>\ 

As such, X(X) represents the prior probability of the system parameters 
lying within that restricted domain, {9 : L(y\9,M) > A}, for which the 

9 For instance, in cosmological applications the prior is typically assumed indepen- 
dently Uniform on each separate model parameter or the logarithm thereof. This allows 
the very fast (with respect to the MCMC evolution alternative) ellipse-based strategy 
of Mukherjee, Parkinson and Liddle (2006) to be applied at the constrained-likelihood 
sampling phase crucial to this algorithm (see below). No such transformation is possible 
for our priors on the controlling parameters of error model C in this study, however, as 
we have analytical forms available only conditionally for io ran |wran and £ran|v r an, w ran ; and 
likewise, for instance, in the case of the Normal-Gamma prior considered by Friel and 
Wyse (2012) in their review. 
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corresponding likelihood exceeds the input, A; and as a probability its range 
is simply zero to one. Supposing the existence of an inverse-defined function 
for the likelihood, L(X(X)) = A, (i.e., supposing L(y\6,M) continuous and 
£1(0) connected, cf. Chopin and Robert 2010) the nested sampling formula- 
tion of Equation 7 under dX becomes simply 



As Skilling (2006) demonstrates, through the maintainance of a reference 
particle population sampled from the prior under a non-decreasing likelihood 
constraint one may generate a sequence of cumulant-likelihood pairings, 
{Xi,Li}, with each known precisely and the corresponding Xi sequence 
known at least in distribution (decreasing near-geometrically toward zero). 
Given a sample from, or asymptotic sampling approximation to, this baseline 
Xi sequence the Z in Equation 9 may be conveniently approximated via a 
simple Riemann sum or trapezoidal rule (cf. the procedure given in Section 
6 of Skilling 2006); with a dominant stochastic error — easily estimable by 
repeated recomputation of Z under resampled Xi — for this process of order 
0(iV -1 / 2 ) governed by the size, N, of the reference population (Chopin and 
Robert, 2010). 

The computational efficiency of the nested sampling algorithm in practi- 
cal applications will most often be limited by one's ability to sample from 
the prior under a strict likelihood constraint. Unless the prior can be trans- 
formed to an ra-dimensional Uniform density as noted above it will be nec- 
essary to run an MCMC chain in order to explore (and thereby sample 
"unbiasedly" from) this constrained volume, starting from the position of 
an existing reference particle and finishing after sufficient moves to obtain 
approximate stationarity (Skilling, 2006; Friel and Wyse, 2012); resulting in 
an overhead of perhaps ~20 likelihood evaluations per {Xi,Li} pairing. As 
a means to take advantage of the accuracy provided by the geometric explo- 
ration of the cumulant baseline in nested sampling, while side-stepping the 
implementational difficulties of constrained likelihood exploration, Chopin 
and Robert (2010) have proposed nested importance sampling. In this al- 
ternative approach the true posterior is approximated via an instrumental 
prior, tt(6), and instrumental likelihood, L(9), with forms chosen to facilitate 
constrained likelihood sampling; and a weighted sum of the true likelihood 
function and true prior at each drawn 0j is then used to approximate Z. 

As in Section 5 of Chopin and Robert (2010) we adopt here a multi- 
variate Normal form for fc(9) with mean vector, p,, and covariance matrix, 
S, set to their sampling estimates from the MCMC output of each model- 
hypothesis combination tested. Supposing an arbitrary, decreasing function 



(9) 




imsart-aoas ver. 2012/04/10 file: finestructure.tex date: July 27, 2012 



28 



EWAN CAMERON & TONY PETTITT 



in quadratic 6 for L(0) thus allows the required sequence of Qi correspond- 
ing exactly to the nested sampling Xi under N reference particles to be 

1 /2 1 /2 

generated from Qi = fi + Cv/\\v\\ 2 , where C is the Choleksy lower 
triangle of S, v ~ A/"(0, J), and % = Q x 2^(e~ l ^ N ) [here Q x 2 represents the 
quantile function of the x 2 distribution of degree, d]. Both d and the dimen- 
sion of I in this expression are, of course, set equal to the dimension of Q. 
By running this nested importance sampling procedure on each of our error 
model-hypothesis pairings with TV = 500 and the end point of summation 
set to well beyond the point of diminishing returns at 5N iterations we find 
that the desired marginal likelihoods can be recovered to within an accu- 
racy of AlogZ = ±0.1 [standard error from repeat simulation] in negligible 
computational time — in particular, much faster than the power posteriors 
method — with one exception, the case of the dipole hypothesis fit under 
error model CoCoCo. 

The reason for this exception is that the posterior for this particular case 
is distinctly multimodal — featuring near equal mass peaks at positive and 
negative values of the normalized shape parameter for each £ ran group — 
meaning that our multivariate Normal instrumental prior is no longer a 
reasonable approximation. In principle it may be possible to extend nested 
importance sampling to the multimodal case through a strategy akin to mul- 
timodal nested sampling (Feroz and Hobson, 2007). However, for the pur- 
poses of the present analysis we rely solely upon our power posteriors-based 
estimate of the marginal likelihood for this specific error model-hypothesis 
pairing (CoCoCo under the dipole) — the power posteriors approach being 
applicable to a wider class of distributions including such multimodal forms. 

Finally, we note that by recording the sequence of Qi drawn along with 
each {Xi,Li} pairing and n(0i) density under nested importance sampling 
one may quickly recompute log Z in response to any "compatible" modifi- 
cation to the prior density — "compatible" meaning here technically that the 
support of the new prior is contained within that of the old, and practically 
(for error control purposes) that this modification does not too drastically 
reduce the effective sample size of the weighted sum. We specifically exploit 
this property of nested importance sampling in the present study (see Sec- 
tion 4.3) in order to check the sensitivity of certain key results to our chosen 
priors, thereby addressing to some extent potential concerns regarding sub- 
jectivity in this aspect of Bayesian model selection (cf. Chipman, George 
and McCulloch 2001). 

4.2. Power Posteriors. The power posteriors method for evaluating the 
marginal likelihood (cf. Friel and Pettitt 2008) takes a thermodynamic inte- 
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gration/path sampling approach (Gelman and Meng, 1998; Lefebvre, Steele 
and Vandal, 2010) to this normalization problem inspired by the methodol- 
ogy of theoretical physics. The power posterior at a given "temperature", t, 
is defined as the distribution: 



with normalization (in general) unknown except at t = 0. As demonstrated 
by Priel and Pettitt (2008) the direct marginal likelihood integral of Equation 
7 may be reformulated as an integral of the mean log likelihood over the 
family of power posteriors spanning t = to 1; that is, 



Thus, via MCMC sampling from each power posterior over a prescribed 
temperature schedule — representing (for the typical case of to = < t± < 
. . . < i n _i < t n = 1) a thermodynamic integration from the prior to the 
posterior — one can approximate the logZ of Equation 11 using (again) a 
Riemann sum or trapezoidal rule applied to the corresponding sequence of 
approximate mean log likelihoods. Here we apply a heating schedule of 40 
increments uniformly-spaced in \/t and draw 10,000 log likelihoods (after a 
1,000 step burn in phase) from each power posterior (to estimate (logL)^). 

4.3. Results. Marginal likelihoods were computed for ten alternative per- 
mutations of our four candidate error forms (Section 3.1) over the three 
e ran groups of the quasar dataset — in particular, the four general (skepti- 
cal) error forms applied homogeneously (AAA, BBB, CCC, DDD), their 
"unbiased" case counterparts applied likewise (AoAoAo, BoBoBo, CoCoCo, 
DoDoDo), plus two non-homogeneous permutations (CAD and CAA) mo- 
tivated by the observed distributions of residuals (A«/a- Aa/a mo d) about 
the King et al. (2012) dipole fit along the equatorial strip of Figure 3 — under 
each of the three competing hypotheses — namely, strict null, monopole- 
only null, and monopole+r(,2)-dipole. These results, totalling thirty separate 
marginal likelihood estimates, are compiled (in log Z format) in Table 1. 

The first point we note here upon consideration of the above is that for 
the nominal error model (AoAoAo) of the Webb et al. team — in which the 
unexplained source of error in each e ran group is assumed strictly Normal 
and strictly unbiased (zero mean and mode) — the log marginal likelihood of 
the dipole hypothesis (log Z = —612.2) exceeds that of both the strict null 
(log Z = —617.5) and monopole-only null (log Z = —617.4) alternatives by 
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Table 1 

Log Marginal Likelihoods for Each Error Model-Hypothesis Pairing Tested Against the 

Quasar Dataset 





AAA 


AoAoAo 


BBB 


BoBoBo 


ccc 


Strict Null 


-607.8 


-617.5 


-619.3 


-628.6 


-608.7 


Monopole 


-608.4 


-617.4 


-620.0 


-628.4 


-609.7 


Dipole 


-609.3 


-612.2 


-620.2 


-622.4 


-609.9 


( 


DoCoCo DDD 


DoDoDo 


CAD 


CAA 


Strict Null 


-617.1 


-623.4 


-627.6 


-614.8 


-607.8 




Monopole 


-615.1 


-624.0 


-628.4 


-615.4 


-607.9 


Dipole 


-612.4 


-616.2 


-618.4 


-616.2 


-608.5 



AlogZ ~ 5.2. Under the Jeffreys scale for interpretation of Bayes factors 
(B.F. = exp[AlogZ]) such an extreme result (a B.F. > 100) should be 
considered decisive evidence in favor of the former; broadly consistent at 
face value (i.e., neglecting for now both our results for alternative error 
models and any subjective priors one might hold on the relative merits of 
these hypotheses) with the Webb et al. team's bootstrap randomization- 
based estimate of ~4c support for spatial variation in the quasar dataset. 
[We say "broadly" as of course: (i) one cannot meaningfully compare p- 
values against Bayes factors in a quantitative sense; and (il) a log marginal 
likelihood offset of A log Z ~ 5.2 would in any case correspond only to ~2.5cr 
support on the one-sided Normal scale.] 

A further inspection of the results compiled in Table 1 reveals that this 
particular ordering of the three hypotheses by marginal likelihood recovered 
under the nominal error model (AoAo Ao) is in fact conserved across all three 
alternative error models tested in their "unbiased" cases — namely, BoBoBo 
(heavy tailed), CoCoCo (skew), and D D D (bimodal). But, one may also 
note that the greatest marginal likelihood recovered here for the [prima 
facie) preferred dipole hypothesis under any of these error models remains 
that of log Z = —612.2 for the nominal error model (unbiased, Normal 
noise); under the interpretation of the Bayes factor as Ockham's razor (cf. 
Jeffreys and Berger 1992; Jaynes 2003) we must conclude that there exists 
at present insufficient evidence to favor any of these more complex error 
models over the nominal in their "unbiased" forms. 

Turning now to the results in Table 1 for our general (skeptical) error mod- 
els, which expressly allow for the possibility of opposing systematic biases 
between the observations from the Keck and VLT telescopes, we discover 
for all but one permutation tested (DDD) a reversal of the above rank- 
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ing of hypotheses; that is, for error models AAA, BBB, CCC, CAD, and 
CAA the strict null and monopole-only null hypotheses are in fact preferred 
in marginal likelihood over the Webb et al. team's monopole+r(2:)-dipole 
hypothesis — although we note that the strength of this preference ranges 
only between "barely worth mentioning" and "substantial" on the Jeffreys 
scale. Furthermore, for three of these error models (AAA, CCC, and CAA) 
the resulting marginal likelihoods for all hypotheses are consistently greater 
than the maximum recovered for the dipole hypothesis under the nominal 
error model; with a tie for the greatest maximum likelihood of the present 
study (log Z = —607.8) between AAA and CAA under the strict null hy- 
pothesis. That we recover such support for a skew Normal term in the Keck 
LC e ran group is quite intriguing in that it may offer some insight into the 
origin of the unexplained errors of this dataset — in particular, it suggests a 
sole contributor, such as bias in the Keck wavelength calibration solution, 
rather than the accumulation of many disparate sources converging toward 
a Normal. Nevertheless, we have elected to focus the ensuring discussion on 
the tied AAA model as this was our a priori default skeptical proposal and 
is most similar in form to its (unbiased) nominal rival. 

As mentioned earlier, in order to investigate the sensitivity of these re- 
sults to our choice of priors on both error model and hypothesis parame- 
ters we make use here of an important property of the nested importance 
sampling algorithm — namely, that it allows for rapid recomputation (i.e., 
requiring no new likelihood evaluations) of the marginal likelihood under 
("compatible") modifications to the prior density. In Figure 9 we explore 
the subjectivity in this regard of our recovered Bayes factors (B.F.) compar- 
ing the monopole+r(z)-dipole hypothesis against the strict null. Specifically, 
in the left hand panel we illustrate the effect of modifying our prior "stan- 
dard deviation" (cr p ; the quotation marks referencing the fact that the o~ p 
hyper-parameter described here is not the true standard deviation for this 
half Normal prior) on the bias (/3 ran ), and our prior range (u q ) on the width 
(cran); of the £ ran component in our default skeptical error model (AAA); 
and we do the same for our prior standard deviation (a m ) on the monopole 
strength term (m), and our prior rate (u n ) on the dipole strength term (B), 
of the dipole hypothesis in the right hand panel. In each case it is evident 
that large modifications of our stated prior beliefs are required to overturn 
the weight of evidence against spatial variation in the fine structure constant 
under this biased error model — either a drastic rescaling of the prior density 
on the magnitude of this bias toward zero (i.e., moving back toward the 
unbiased alternative), or a significant tightening of our prior on the dipole 
strength term (bringing it toward its posterior mean, though we have no a 

imsart-aoas ver. 2012/04/10 file: finestructure.tex date: July 27, 2012 



32 



EWAN CAMERON & TONY PETTITT 




Error Model A 



B.F. : Null v Dipole 
© >5 
>1.5 

XXX ~ 1 

<1/1.5 
» <1/5 



oo 




0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 

o p /10- 5 :n((3 ran )~N half (0,o2) 



Error Model A 


B.F. : Null v Dipole 




© >5 




>1.5 




JxScXxx ~ 1 

-=1/1 .5 




<1/5 




loooooooo 



0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 

a m /10- 5 :n(m)~N ha i,(0,c£) 



Fig 9. Exploring the sensitivity to our chosen priors of the B ayes factor (B.F.) comparing 
the monopole+r(z) -dipole hypothesis against the strict null under the default skeptical error 
model (AAA). By re-weighting the summation of {Xi, Li} pairings under each new n(6i) 
in the nested importance sampling computation of log Z for each hypothesis we explore in 
the lefthand panel the effect of varying our prior "standard deviation" (o p ) on the bias 
(Pra,ii), and prior range (u q ) on the width (cr ran/ ), of the unexplained errors (dun); and we 
do the same for our prior standard deviation (a m ) on the monopole strength term (m), 
and our prior rate (u n ) on the dipole strength term (B), of the dipole hypothesis in the 
righthand panel. In each case it is evident that large modifications of our stated prior beliefs 
(indicated by the purple asterix overlaid on each plot) are required to overturn the weight 
of evidence against spatial variation in Aa/a under this biased error model. (Note that 
the Bayes factors for u q > 5 x 1CF 5 shown in the lefthand panel are shaded differently to 
indicate that the adopted rescalmg beyond this point is not technically "compatible" in the 
sense that the new prior support is not contained within that of the old; but this should 
not appreciably affect the accuracy of our marginal likelihood evaluations presented here 
as the likelihood for all (T ra n > 5 x 10~ J is itself exceedingly small.) 



priori theoretical motivation for such a small dipole strength 10 ). 

In summary, the marginal likelihood comparisons presented herein point 
toward the following key result: that the Webb et al. team's claimed "4cr 
support" for a dipole signature in the quasar dataset depends crucially upon 
their assumption that the unexplained error terms in the Keck and VLT sub- 
samples contribute only unbiased noise to the estimation of Aa/a. We now 
discuss these results more fully (i.e., in the context of our subjective priors 
on the plausibility of each error model-hypothesis pairing) with regard to 
the evidence, or lack thereof, for a spatially-varying fine structure constant 
in Section 5 below. 



"'indeed, Sean Carroll has argued (in his blog post mentioned in the Introduction) 
that if anything one would expect a far greater dipole strength considering the energy 
level implied for the underlying scalar field. 
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5. Discussion and Conclusions. Although (as mentioned in the In- 
troduction) there exist various exotic physical theories within which a time- 
and/or space- varying fine structure constant can be readily admitted (Beken- 
stein, 1982; Carroll, 1998; Marciano, 1984; Brax et al., 2003) for many physi- 
cists the prior probability of such must be considered small, given both 
the results of previous experiments and our faith in certain long-standing 
physical principles. In particular, past analyses of the Oklo natural fission 
reactor (Shlyakhter, 1976; Damour and Dyson, 1996; Gould et al., 2006), 
Earth- fallen meteorite samples (Olive et al., 2004), and optical atomic clocks 
(Fortier et al., 2007; Rosenband et al., 2008) have strongly favored a sce- 
nario of negligible temporal variation in a locally; with Rosenband et al. 
(2008), for instance, determining |Aa/a| < 1.6 x 10~ 17 per year at present 
on Earth. These results may "daringly" (with respect to the strong assump- 
tion of zero higher derivatives) be extrapolated to |Aa/a| < 2 x 10~ 7 since 
the Big Bang (~13.4 Gyr ago); with more prosaic but reliable constraints of 
|Aa/a| < 0.1 over this time provided by contemporary analyses of fluctua- 
tions in the Cosmic Microwave Background (Rocha et al., 2004; Nakashima, 
Nagata and Yokoyama, 2008; Calabrese et al., 2011). 

Spatial variation in the fine structure constant, on the other hand, has 
been previously constrained to |Aa/o| < 10~ 4 on cosmic scales via both 
emission and absorption line based quasar studies (Bahcall and Salpeter, 
1965; Bahcall, Sargent and Schmidt, 1967; Ivanchik, Potekhin and Var- 
shalovich, 1999), although this may only be considered weak prior evidence 
against the Webb et al. team's specific claim as the dipole strength inferred 
from the quasar dataset contributes an effect well below this level. More im- 
portantly from an astronomical perspective is our prior belief in the cosmo- 
logical principle — namely, that the physical laws of the Universe are uniform 
throughout; and consequently that the Universe appears essentially homoge- 
nous when viewed on sufficiently large scales. This principle not only serves 
as a key plinth of the predominant cosmological model but has been hith- 
erto well-supported by measurements of the fractal dimension (converging 
toward three) at large scales in galaxy redshift surveys (Martinez et al., 1998; 
Scrimgeour et al., 2012) and by the isotropy of the Cosmic Microwave Back- 
ground (Mandolesi et al. 1986; though see Eriksen et al. 2007 and Clarkson 
and Barrett 1999). The proposal of a large-scale spatial variation across the 
observable Universe in one of the fundamental coupling constants of QED is 
thus in direct conflict with this trusted paradigm. In conjunction with Sean 
Carroll's previously-noted objections (from his "Cosmic Variance" blog; see 
Introduction) regarding the surprisingly low mass-energy budget implied by 
such a smoothly varying scalar field we must from this perspective assume 
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a significant prior weighting against the dipole interpretation. 

To quantify this prior weighting we might for arguments sake suppose a 
preference approaching the 5a level for the established theory inspired by the 
agreed evidence benchmark in the particle physics community for a "definite 
discovery" of the Higgs boson 11 — that is, P pr i or (strict null) /P pr iov (dipole) « 
3.5 x 10 6 (from the one-sided Normal tail). In this conservative scenario, 
even the ~4<r support claimed by the Webb et al. team under their nominal 
error model (and broadly "confirmed" in this study with a Bayes factor of 
ssl80, "corresponding" to ss2.5<7 support) must be considered insufficient 
to conclude that the the fine structure constant varies across the Universe 
in the manner described. Indeed, such a prior weighting against the dipole 
hypothesis appears reflected in the relative caution with which both inde- 
pendent cosmologists (Olive, Peloso and Uzan, 2011) and the Webb et al. 
team themselves have at times discussed their results. Dedicated observa- 
tions targetting quasars at the poles of the inferred dipole have thus been 
proposed (Webb et al., 2011; King et al., 2012) in order to establish the 
further support neccessary for this hypothesis to become widely accepted. 

However, as we have already demonstrated in Section 4.3 above the present 
quasar dataset does not exclude (in fact it weakly favors) the skeptical inter- 
pretation that there exist sources of opposing systematic bias in the Aa/a 
estimates deriving from the Keck and VLT telescopes, artificially inducing 
the apparent dipole signature. And, crucially, such future observations 12 fo- 
cussed exclusively on the poles can do little to settle this debate, even if the 
nominal error model and dipole hypothesis are in fact correct. We demon- 
strate this explicitly via the following simulation procedure making use of 
our posterior predictive, the results of which are shown in Figure 10. First 
we draw a vector of 6 m parameters for the dipole hypothesis and 6 e param- 
eters for error model AoAoAo from the (present) posterior. Using these as 
input we simulate n VLT data values (Aa/a estimates) under this nominal 
error model for sightings along the maximum pole, and we assign each a 
z a bs and o"obs sampled (with replacement) from the observed distribution in 
the quasar dataset; and we do the same thereafter for n replicate Keck (LC) 
observations sighted along the minimum pole. We then add all these mock 
observations to the existing quasar dataset, and recompute marginal like- 
lihoods for the dipole hypothesis under AqAqAo and the strict null under 



11 Acknowledging that the 5<r significance benchmark of the particle physics community 
is in fact in reference to a long-run sampling p-value under the null. 

12 We assume these to be conducted on the same telescopes as the original dataset — 
namely, the Keck and the VLT — as these represent the most powerful (and efficient) 
facilities available at present for this type of spectroscopy. 
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AAA (using the very fast nested importance sampling algorithm). By re- 
peating these steps a thousand times over we thus estimate the distribution 
of such future (log) Bayes factors 13 under the proposed observing strategy 
(and error model-hypothesis pairing). As is evident from the lefthand panel 
of Figure 10, with even 250 new observations at each pole there remains 
less than a 15% chance of recovering more than 5c Bayes factor support 
for the dipole with nominal error model (assumed true) over the strict null 
with skeptical error model. [We note that the wide range of predicted Bayes 
factors recovered here arises primarily from the present range of permitted 
dipole strengths expressed by our posterior on parameter B (see Figure 8), 
which dominates over the (well known) sampling variance of the Bayes factor 
itself (cf. Good 1992; Jenkins and Peacock 2011).] 
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Fig 10. The predicted power of future quasar observations for dismissing the skeptical 
interpretation (of opposing biases in the Keck and VLT error terms) supposing the dipole 
hypothesis with nominal (unbiased) error model as truth. In the lefthand panel we illus- 
trate the simulated distribution of log marginal likelihood differences (i.e., log Bayes fac- 
tors) recovered after the addition of 50, 100, and 250 additional mock observations (Aa/a 
estimates) from each telescope directed along the corresponding pole given a random pa- 
rameterization of the monopole+r(z)- dipole hypothesis drawn from its present posterior; 
each subject to the appropriate noise term under model AoAqAo and sampled in z a b s and 
cr bs as per the original dataset. We present the same in the righthand panel, but for a 
scenario in which identical numbers of total mock observations are split over both polar and 
equatorial sightings. The latter strategy clearly offers far better prospects for dismissing the 
skeptical hypothesis (if indeed false) . For reference, the red, yellow, and green- shaded zones 
under each curve mark the AlogZ limits corresponding to >3a, >4o~, and >5a support, 
respectively. 



A more profitable strategy for distinguishing between the dipole under the 
nominal error model and the null under the default skeptical error model 

13 Trotta (2007) has proposed for this the term, "predicted posterior odds distribution", 
with acronym, PPOD. 
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would instead be to focus in part on adding further observations along the 
equatorial strip where the relative bias between these two instruments may 
be most readily constrained. We once again demonstrate this by repeat- 
ing our simulation procedure to predict future Bayes factors, but this time 
employing an observational strategy in which half the quasar sightings are 
placed on the equator (and half again on the poles). Our results are sum- 
marized in the righthand panel of Figure 10 from which it is evident that 
we now have a better than one-in-two chance of recovering the sought after 
5cj support for the nominal dipole hypothesis (assumed as truth) for the 
n = 250 experiment. 

Thus, we reach the final conclusions of our Bayesian reanalysis of the 
quasar dataset. Namely, that: (i) given both our incomplete understanding 
of the observational errors and our limited theoretical (prior) expectations 
regarding the properties of any spatial variation in the fine structure con- 
stant the present observational coverage (featuring limited overlap of the two 
telescopes for which opposing biases might be suspected) must be deemed in- 
adequate to properly distinguish the Webb et al. team's proposed dipole field 
from the (strict or monopole) null; and (n) one cannot afford to overlook the 
importance of observations along the equator of the alleged dipole in additon 
to those proposed along its poles when planning future campaigns (with the 
Keck and VLT telescopes) for the purpose of settling this debate. 

We have also demonstrated in this study the utility of the nested impor- 
tance sampling technique for marginal likelihood computation: in particular 
with regard to (i) the ease with which it allows for testing the sensitivity of 
the Bayes factor to the priors; and (il) its potential to facilitate prediction 
of the Bayes factor distribution for future observations (under the specifica- 
tion of a particular hypothesis as truth) without unreasonable demand on 
computational power or time. 
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